Bayesian analysis of matrix data with rstiefel

نویسنده

  • Peter D. Hoff
چکیده

We illustrate the use of the R-package rstiefel for matrix-variate data analysis in the context of two examples. The first example considers estimation of a reduced-rank mean matrix in the presence of normally distributed noise. The second example considers the modeling of a social network of friendships among teenagers. Bayesian estimation for these models requires the ability to simulate from the matrix-variate von Mises-Fisher distributions and the matrix-variate Bingham distributions on the Stiefel manifold. 1 Exponential families on the Stiefel manifold The set of m × R matrices U for which UTU = IR is called the m × R Stiefel manifold and is denoted VR,m. The densities of a quadratic exponential family on this manifold (with respect to the uniform measure) are given by p(U|A,B,C) ∝ etr(CU + BUAU), (1) where C ∈ Rm×R, B is an R×R diagonal matrix and A is a symmetric matrix. Since UTU = I, the density is unchanged under transformations of the form A→ A+ aI or B→ B+ bI. Additionally, it is convenient to restrict the diagonal entries of B to be in decreasing order. If B is not ordered in this way, there exists a reparameterization (A, B̃, C̃) giving the same distribution as (A,B,C) but where B̃ has ordered diagonal entries. More details on the Stiefel manifold and these distributions can be found in Chikuse (2003), Hoff (2009a), Hoff (2009b) and the references therein. ∗Departments of Statistics and Biostatistics, University of Washington, Seattle, WA 98195-4322. This research was supported in part by NI-CHD grant 1R01HD067509-01A1. 1 Distributions of this form were originally studied in the case R = 1, so that the manifold was just the surface of the m-sphere. In this case, B reduces to a scalar and can be absorbed into the matrix A. The quadratic exponential family then has densities of the form p(u|c,A) ∝ exp(cu + uAu). (2) The case that A = 0 was studied by von Mises, Fisher and Langevin, and so a distribution with density proportional to exp(cTu) is often called a von Mises-Fisher or Langevin distribution on the sphere. The case that c = 0 and A 6= 0 was studied by Bingham (1974), and is called the Bingham distribution. This distribution has “antipodal symmetry” in that p(u|A) = p(−u|A), and so may be appropriate as a model for random axes, rather than random directions. In recognition of the work of the above mentioned authors, we refer to distributions with densities given by (2) and (1) as vector-variate and matrix-variate Bingham-von Mises-Fisher distributions, respectively. This is a rather long name, however, so in this vignette I will refer to them as BMF distributions. The case that A (or B) is the zero matrix will be referred to as an MF distribution, and the case that C is zero will be referred to as a Bingham distribution. More descriptive names might be L, Q and LQ to replace the names MF, Bingham, and BMF, respectively, the idea being that the “L” and “Q” refer to the presence of linear and quadratic components of the density. 2 Model-based SVD It is often useful to model an m × n rectangular matrix-variate dataset Y as being equal to some reduced rank matrix M plus i.i.d. noise, so that Y = M + E, with the elements { i,j : 1 ≤ i ≤ m, 1 ≤ j ≤ n} of E assumed to be i.i.d. with zero mean and some unknown variance σ2. The singular value decomposition states that any rank-R matrix M can be expressed as M = UDVT , where U ∈ VR,m, V ∈ VR,n and D is an R × R diagonal matrix. If we are willing to assume normality of the errors, the model can then be written as Y = UDV + E E = { i,j : 1 ≤ i ≤ m, 1 ≤ j ≤ n} ∼ i.i.d. normal(0, σ). Bayesian rank selection for this model was considered in Hoff (2007). In this vignette we consider estimation for a specified rank R, in which case the unknown parameters in the model are 2 {U,D,V, σ2}. Given a suitable prior distribution over these parameters, Bayesian inference can proceed via construction of a Markov chain with stationary distribution equal to the conditional distribution of the parameters given Y, i.e. the distribution with density p(U,D,V, σ2|Y). In particular, conjugate prior distributions allow the construction of a Markov chain via the Gibbs sampler, which iteratively simulates each parameter from its full conditional distribution. If the prior distribution for U is uniform on VR,m, then its full conditional density is given by p(U|Y,D,V, σ) ∝ p(Y|U,D,V, σ) ∝ etr(−[Y −UDV ] [Y −UDV ]/(2σ)) ∝ etr([YVD/σ]U), which is the density of an MF(YVD/σ2) distribution. Similarly, the full conditional distribution of V under a uniform prior is MF(YTUD/σ2). For this vignette, we will use the following prior distributions for {d1, . . . , dR, σ2}: {d1, . . . , dR|τ} ∼ i.i.d. normal(0, τ) 1/τ ∼ gamma(η0/2, η0τ 0 /2) 1/σ ∼ gamma(ν0/2, ν0σ 0/2.) The corresponding full conditional distributions are {dj |U,V,Y,d−j , σ, τ} ∼ normal(τuj Yvj/[σ + τ], τσ/[τ + σ]) {1/τ|U,D,V,Y, σ} ∼ gamma([η0 +R]/2, [η0τ 0 + ∑ dj ]/2) {1/σ|U,D,V,Y, τ} ∼ gamma([ν0 +mn]/2, [ν0σ 0 + ||Y −UDV ||]/2). 2.1 Simulated data We now randomly generate some parameters and data according to the model above: > library(rstiefel) > set.seed(1) > m<-60 ; n<-40 ; R0<-4 > U0<-rustiefel(m,R0) > V0<-rustiefel(n,R0) 3 > D0<-diag(sort(rexp(R0),decreasing=TRUE))*sqrt(m*n) > M0<-U0%*%D0%*%t(V0) > Y<-M0 + matrix(rnorm(n*m),m,n) The only command from the rstiefel package used here is rustiefel, which generates a uniformly distributed random orthonormal matrix. Note that rustiefel(m,R) gives a matrix with m rows and R columns, and so the arguments are in the reverse of their order in the symbolic representation of the manifold VR,m. 2.2 Gibbs sampler Now we try to recover the true values of the parameters {U0,V0,D0, σ2} from the observed data Y. Just for fun, let’s estimate these parameters with a presumed rank R > R0 that is larger than the actual rank. Equivalently, we can think of U0,V0,D0 as having dimension m×R, n×R and R×R, but with the last R−R0 diagonal entries of D0 being zero. The prior distributions for U and V are uniform on their respective manifolds. We set our hyperparameters for the other priors as follows: > nu0<-1 ; s20<-1 #inverse-gamma prior for the error variance s2 > eta0<-1 ; t20<-1 #inverse-gamma prior for the variance t2 of the sing vals Construction of a Gibbs sampler requires starting values for all (but one) of the unknown parameters. An natural choice is the MLE: > R<-6 > tmp<-svd(Y) ; U<-tmp$u[,1:R] ; V<-tmp$v[,1:R] ; D<-diag(tmp$d[1:R]) > s2<-var(c(Y-U%*%D%*%t(V))) > t2<-mean(diag(D^2)) Let’s compare the MLE of D to the true value: > d.mle<-diag(D) > d.mle [1] 40.05172 25.00226 19.70827 13.43382 13.10381 12.64942 4 > diag(D0) [1] 38.514216 24.015791 17.352783 1.169442 The values of the MLE are, as expected, larger than the true values, especially for the smaller values of D0. Now let’s see if the Bayes estimate provides some shrinkage. > MPS<-matrix(0,m,n) ; DPS<-NULL > for(s in 1:2500) + { + U<-rmf.matrix(Y%*%V%*%D/s2) + V<-rmf.matrix(t(Y)%*%U%*%D/s2) + + vd<-1/(1/s2+1/t2) + ed<-vd*(diag(t(U)%*%Y%*%V)/s2 ) + D<-diag(rnorm(R,ed,sqrt(vd))) + + s2<-1/rgamma(1, (nu0+m*n)/2 , (nu0*s20 + sum((Y-U%*%D%*%t(V))^2))/2 ) + t2<-1/rgamma(1, (eta0+R)/2, (eta0*t20 + sum(D^2))/2) + + ### save output + if(s%%5==0) + { + DPS<-rbind(DPS,sort(diag(abs(D)),decreasing=TRUE)) + M<-U%*%D%*%t(V) + MPS<-MPS+M + } + } This generates a Gibbs sampler of 2500 iterations. Here, we save the values of D every 5th iteration, resulting in a sample of D-values of size 500 with which to estimate p(D|Y). Additionally, we can obtain a posterior mean estimate of M0 = U0D0V T 0 via the sample average of UDV T . Note that 5 0 100 200 300 400 500 0 10 20 30 40 iteration/5 D ● ● ● ● ● ● ● ● ●

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Bayesian Optimal Design for Logistic Model

Consider a Bayesian optimal design with many support points which poses the problem of collecting data with a few number of observations at each design point. Under such a scenario the asymptotic property of using Fisher information matrix for approximating the covariance matrix of posterior ML estimators might be doubtful. We suggest to use Bhattcharyya matrix in deriving the information matri...

متن کامل

Spatial Design for Knot Selection in Knot-Based Low-Rank Models

‎Analysis of large geostatistical data sets‎, ‎usually‎, ‎entail the expensive matrix computations‎. ‎This problem creates challenges in implementing statistical inferences of traditional Bayesian models‎. ‎In addition,researchers often face with multiple spatial data sets with complex spatial dependence structures that their analysis is difficult‎. ‎This is a problem for MCMC sampling algorith...

متن کامل

Bayesian and Iterative Maximum Likelihood Estimation of the Coefficients in Logistic Regression Analysis with Linked Data

This paper considers logistic regression analysis with linked data. It is shown that, in logistic regression analysis with linked data, a finite mixture of Bernoulli distributions can be used for modeling the response variables. We proposed an iterative maximum likelihood estimator for the regression coefficients that takes the matching probabilities into account. Next, the Bayesian counterpart...

متن کامل

Bayesian Analysis of Survival Data with Spatial Correlation

Often in practice the data on the mortality of a living unit correlation is due to the location of the observations in the study‎. ‎One of the most important issues in the analysis of survival data with spatial dependence‎, ‎is estimation of the parameters and prediction of the unknown values in known sites based on observations vector‎. ‎In this paper to analyze this type of survival‎, ‎Cox...

متن کامل

Matrix-Variate Beta Generator - Developments and Application

Matrix-variate beta distributions are applied in different fields of hypothesis testing, multivariate correlation analysis, zero regression, canonical correlation analysis and etc. A methodology is proposed to generate matrix-variate beta generator distributions by combining the matrix-variate beta kernel with an unknown function of the trace operator. Several statistical characteristics, exten...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012